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Stability and diversity are two key properties that living entities share with spin glasses, where 
they are manifested through the breaking of the phase space into many valleys or local minima 
connected by saddle points. The topology of the phase space can be conveniently condensed into a 
tree structure, akin to the biological phylogenetic trees, whose tips are the local minima and internal 
nodes are the lowest-energy saddles connecting those minima. For the infinite-range Ising spin glass 
with p-spin interactions, we show that the average size-frequency distribution of saddles obeys a 
power law {ip(w)) ~ w~ D , where w — w(s) is the number of minima that can be connected through 
saddle s, and D is the fractal dimension of the phase space. 
PACS 75.10.Nr (principal), 87. 23. Kg 



The resemblance between models of adaptive evolution 
PJ and disordered spin systems is certainly not coin- 
cidental. In fact, two key features that any successful 
model of biological evolution ought to possess are stabil- 
ity and diversity, i.e., exactly the properties responsible 
for the complex thermodynamics of spin glasses ||. It 
is not a surprise therefore, that many of the tools and 
concepts of the statistical mechanics of disordered sys- 
tems have been applied to the study of the evolutionary 
process. In this contribution we show that such inter- 
change can be profitable to statistical mechanics too, in 
that a great deal of information about the phase space of 
spin-glass models can be condensed into a tree structure, 
in a standard procedure widely used in taxonomy and 
molecular phylogenetics Q|. As in the biological case, 
the geometric properties of this tree can be used to char- 
acterize the disordered system quantitatively. 

The main unifying concept in the investigations of the 
physics of disordered systems and evolutionary change 
is probably the notion of fitness or energy landscape. 
The concept of neighborhood among genotypes (config- 
urations), typically defined such that point mutations 
interconvert neighbors, allows us to view the set of all 
genotypes as the vertices of a graph with edges connect- 
ing neighboring configurations. A fitness landscape is 
then obtained by assigning a fitness value to each ver- 
tex. An explicit connection between those two research 
fields is obtained in the case of populations of asexually 
reproducing haploid organisms evolving on rugged fitness 
landscapes. In this case the genotypes are often modeled 
by configurations of N Ising spins s = (s 1; . . . , s^) with 
Si = ±1 so that a point mutation corresponds to a single 
spin flip. In the simplest case, evolutionary adaptation 
is described as an "adaptive walk" on the fitness land- 
scape whose statistical mechanics equivalent is the 
zero-temperature Glauber dynamics. The fitness func- 
tion assigns a random numerical value to each one of the 
2 N spin configurations. In this work we consider the p- 



spin landscapes B: 

TCp(s) = — Ji 1 i 2 ...i N s il s i2 ■ ■ ■ s iN (1) 

l<ii <i 2 ...<iN<N 

where the Ji 1 i 2 ...i JV are statistically independent Gaus- 
sian distributed random variables with mean zero and 
variance p\/(2N p ~ 1 ). The p-spin models form a class 
of tunably rugged landscapes similar to Kauffman's Nk- 
model jlj , which is not only more appealing to statistical 
mechanics but also is a more natural basis of landscape 
theory ||. In fact, for p = 2 the Hamiltonian H. p reduces 
to the SK model (?|] which exhibits a large number of 
highly correlated local minima, while the limit p — > oo 
corresponds to the random energy model (REM) || and 
yields an extremely rugged, uncorrelated landscape. Like 
the Nk-model, p-spin landscapes have been used repeat- 
edly to model evolutionary processes, see e.g. ||, 

The scenario that emerges from the replica approach 
to disordered spin systems is that the phase space com- 
posed of the 2 N spin configurations is broken into many 
valleys Q . The ease with which one valley can be reached 
from another one depends on the saddle points connect- 
ing them. More specifically, the energy of the lowest 
saddle point separating two local minima x and y is 



E\x,y] = min max7i i0 (z) 



(2) 



where ¥ X y is the set of all paths p connecting x and y 
by a series of subsequent spin- flips (or point mutations). 
The saddle-point energy E[., . ] is an ultrametric dis- 
tance measure on the set of local minima, see e.g. ||. 

Let us assume for a moment that the energy function 
is non-degenerate, i.e., H p (x) ^ T~Lp(y) whenever x ^ y. 
This is true for generic p-spin models with odd interaction 
order p. Then there is a unique saddle point s = s{x,y) 
connecting x and y characterized by TL p {s) — E[x,y\. 



FIG. 1: Barrier tree for a 3-spin model with N = 7. The 9 
minima (tips) and 8 connecting saddle points (internal nodes) 
are labeled by their spin configurations. 

Note that this definition of saddle point is more restric- 
tive than in differential geometry where saddles are not 
required to separate local optima jn^. To each saddle 
point s there is a unique collection of configurations B(s) 
that can be reached from s by a path along which the 
energy never exceeds TL p (s). In other words, the config- 
urations in B(s) are mutually connected by paths that 
never go higher than TL p (s). This property warrants to 
call B(s) the valley or basin below the saddle s. Fur- 
thermore, suppose that H p (s) < H p (s'). Then there are 
two possibilities: if s £ B(s') then B(s) C B(s'), i.e., the 
basin of s is a "sub-basin" of B(s'), or s ^ B(s') in which 
case B(s) (~1 B(s') — 0, i.e., the valleys are disjoint. This 
property arranges the local minima and the saddle points 
in a unique hierarchical structure which is conveniently 
represented as a tree, termed barrier tree (see Fig. [I]). 

In principle, barrier trees can be computed by means 
of the following simple recursive procedure: The tips of 
the tree are the local minima. The parent of tip x is the 
lowest-energy saddle point s that connects x to another 
local minimum. Analogously, the parent of a saddle point 
s is another saddle point s' that connects s to a local min- 
imum z that is not contained in the basin below s, i.e., 
z B(s). The "root" of the resulting tree is the saddle 
point s* with the highest energy, since by definition all 
local minima are contained in B(s*). Note that the sub- 
tree 1(s) that has the saddle s as its root has exactly the 
local minima in B(s) as its tips. The exact calculation 
of the barrier tree is a highly challenging computational 
problem and only recently some progress in that direc- 
tion has been achieved, mainly in the context of RNA and 
protein folding [O, [l^] (see also The reason is that, 

unless one has sufficient a priori knowledge on the land- 
scape, it is necessary to generate the complete landscape 
in order to find all local minima. Even for very small sys- 
tem sizes a simple-minded exhaustive search approach to 
evaluating Eq. (|^) would be hopeless as one must cal- 
culate all paths connecting all pairs of minima. In this 
contribution we use the program package barriers-0.9 



FIG. 2: Unrooted phylogenetic tree of extant species (white 
dots) obtained from the minimum fitness paths, shown by thin 
lines with saddle points (ancestors) indicated by black dots, 
does not necessarily coincide with the tree obtained from clus- 
tering methods that are based on sequence similarity, shown 
here with thick lines. The gray intensity is proportional to 
the fitness value. 

to construct the barrier tree from an energy sorted list of 
spin configurations in linear time. The algorithm explic- 
itly constructs the basins B(s) and subtrees T(s) 

The barrier tree can be viewed as a phylogenetic tree 
with a single common ancestor at the root. The evo- 
lutionary process leading to the extant species (i.e., the 
tips of the tree) is an adaptive walk on the rugged fitness 
landscape (]!]). Interestingly, according to definition (|^) a 
subtree connecting two tips corresponds to the evolution- 
ary path of minimum fitness cost which could be regarded 
as a generalization of the maximum parsimony princi- 
ple to rugged fitness landscapes. This differs from 
the usual approach in molecular biology where a flat fit- 
ness landscape and a diffusive behavior in sequence space 
is assumed to justify the reconstruction of phylogenetic 
trees based solely on sequence similarity, i.e., configura- 
tional overlap. In Fig. || we give an example in which 
the distance-based tree and the barrier tree have differ- 
ent topologies. In particular, the species labeled A and B 
are much closer in the barrier tree than in the maximum 
parsimony tree. It is interesting to note that barrier trees 
can be defined in a meaningful way also for continuous 
energy surfaces. Their nodes are the local minima and 
the saddle points satisfying Eq. (||), while the edges can 
be associated with saddle connections along which the 
energy varies monotonically. 

The definition of barrier trees becomes more compli- 
cated if the energy function is degenerate as in the case 
of p-spin models with even p, where H p (s) = H p (—s). 
The appropriate definition of the barrier tree is obtained 
by identifying the saddle points s and s with the same 
interior node of the tree provided (i) they have the same 
energy and (ii) they are connected by a path along which 
the energy does not exceed H p (s) = H p (s r ). In the non- 
degenerate case the trees are almost always binary. De- 
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generacies can occur also for geometric reasons since the 
same saddle point can connect more than two basins. 
One may, however, "expand" a non- binary interior node 
into a sequence of binary nodes at the same height. We 
use this technical trick here to simplify the computations. 
This procedure is justified because it affects only the sad- 
dle points above the saddle s that connects the ground- 
state and its mirror image, i.e., it affects only the nodes 
with large basins, far beyond the regime where the tree 
exhibits self-similarity. 

One important aggregate characteristic of a tree is 
the size-frequency distribution of its saddles or subtrees 
ip(w), where the size w — w(s) is, in the simplest case, 
just the number of local minima or tips in B(s). It is in- 
structive to consider first a few examples of simple ideal 
trees for which tp(w) can be calculated analytically. E.g., 
a symmetric binary tree of depth m > 1 has 2 m — 1 nodes, 
of which 2 m_1 are tips and the remaining 2 m ~ 1 — 1 are 
saddles. It can be easily shown that there are 2™ l ~ fe ~ 1 
saddles with sizes w — 2 k ; k — 1, . . . , m — 1, so that 
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if log 2 w G N 
if log 2 w £ N. 



(3) 



The other extreme is the asymmetric binary tree in which 
every left child is a tip. There are 2m — 1 nodes: m tips 
and one saddle with size w, 1 < w < m — 1. Hence 



ip(w) — 



1 



1 



for 2 < w < m. 



(4) 



We turn now to the analysis of the complex trees as- 
sociated to the random energy function (|l|) as produced 
by the program barriers-0 . 9. In this case the average 
number of tips increases exponentially with the number 
of spins e apN , where a p increases from a 2 = 0.199 to 
ctoo = In 2 (2). Log- log plots of the average size- frequency 
distributions of subtrees for particular p-spin models and 
the REM are shown in Fig. ||. In all cases the data are 
very well fitted by straight lines with negative slopes in 
the regime of high frequencies, suggesting then a power 
law form 



(5) 



where D can be viewed as the fractal dimension of the 
barrier tree and hence of the phase space of the disor- 
dered system. This result points out a very large num- 
ber of subtrees with a few tips and a very small number 
of subtrees with many tips. Moreover, it implies that 
there is no characteristic number of tips within subtrees. 
As illustrated by the gray data points in the panel for 
the REM, the same scaling law seems to hold true for 
the size-frequency distribution of a single instance of the 
landscape as well. The asymmetric scattering of points 
observed in the low frequency regime, i.e., (ip(w)) <C 1, 
is due to the few high energy sadddle points near the 
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FIG. 3: Log- log plots of the average frequency of subtrees 
(saddle points) with different numbers of tips (minima) for the 
p-spin landscapes with N = 18, and the REM with N = 16, 
averaged over 300 landscapes each. For the REM, a size- 
frequency distribution for a single landscape is shown as gray 
squares to demonstrate the scatter. 

root of the tree. It is interesting to note that many fre- 
quency distributions of taxonomic units containing var- 
ious numbers of subunits (e.g., species per genus) were 
found to be well described by power laws JL4|. It should 
be stressed that the scaling law (|J) is by no means a 
mere consequence of the existence of an underlying tree 
structure, as it is clear from the asymmetric binary tree 
example discussed before as well as from the study of dis- 
crete branching processes which generate different forms 
of size- frequency distributions |lq] . 

The average size-frequency distribution for the SK 
model displays a rather distinct behavior pattern which 
seems to indicate the existence of two different types of 
self-similar structures at different levels of the tree. These 
structures are characterized by straight lines with dis- 
tinct slopes, being joined by a short, almost flat curve 
corresponding to a crossover regime between the dom- 
inant structures. The first slope appears to be around 
D\ k 1.4, the second slope is L> 2 ~ 2. These results, how- 
ever, must be taken with caution since the size-frequency 
statistics is greatly impaired by the fact that the trees 
are typically very small in this model. For instance, for 
N = 18 the average number of tips is about 50 in the SK 
model as compared to the ~ 10 5 tips in the REM. 

In addition to being a global measure for the character- 
ization of rugged landscapes, to be contrasted with local 
measures such as the correlation length || |l6), the pa- 
rameter D also yields a measure of diversity since D is 
higher in systems where subtrees with one or a few tips 
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FIG. 4: Dependence of the exponents D on the reciprocal of 
the number of spins N. The data for p — 2 do not allow for 
the unambiguous estimate of an exponent D. 

are more numerous. An attempt to estimate the frac- 
tal dimension D for infinite system sizes is presented in 
Fig. |J. From these data one cannot discard the possibility 
that for very large systems the exponents will converge 
to D = 2 independently of p, in which case the value 
of D might be seen as an index that characterizes the 
universality class of the p-spin landscapes. Since the ex- 
act construction of barrier trees is at present feasible for 
systems of sizes up to N — 24 only, the estimate of D 
for larger systems must resort to a stochastic approach, 
probably in the spirit of the coalescent theory of popula- 
tion genetics Q , which focuses on the genealogy of a few 
sampled individuals rather than on the family tree of the 
entire population. 

The replica theory predicts a similar hierarchical struc- 
ture for the space of pure states, consisting of clusters 
within clusters. In particular, the probability density 
that a cluster has weight W (roughly the fraction of pure 
states within it) is || 



f(W) 



w y ~ 2 (i - wy 

T{y)T{l-y) 



(6) 



which for small weights reduces to a power law f(W) ~ 
W v ~ 2 . Here y G (0,1) is a complicated function of 
the physical parameters p and the temperature T. For 
instance, for the REM one has y = 1 — T/T c where 
T c = (4 In 2) ' . As the replica pure states space is 
a rather abstract construct (e.g., only a few low-energy 
local minima are pure states and the definition of clus- 



ters does not involve the notion of saddle points) a direct 
comparison with our results is not evident; albeit it is 
highly desirable since we are not aware of any attempt 
to verify Eq. (^J) via numerical simulations. 
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